pacman::p_load(arrow, gifski, lubridate, maptools, tidyverse, tmap,
raster, reshape2, sf, sp, spatstat, spNetwork)Take-home Exercise 1: Application of Spatial Point Patterns Analysis to discover the geographical distribution of Grab hailing services in Singapore
1.0 Introduction
1.1. Overview - Setting the Scene
Human mobility, the movement of human beings in space and time, reflects the spatial-temporal characteristics of human behavior. With the advancement Information and Communication Technologies (ICT) especially smart phone, a large volume of data related to human mobility have been collected. By using appropriate GIS analysis methods, these data are potentially useful in supporting smart city planning and management.
In Singapore, one of the important source of data related to human mobility is from Land Transport Authority (LTA) DataMall. Two data sets related to human mobility are provided by the portal, they are: Passenger Volume by Origin Destination Train Stations and Passenger Volume by Origin Destination Bus Stops. One of the limitation of these data sets is that their location are biased to either bus stops or MRT/LRT stations. In 2020, another very interesting human mobility data set called Grab Posisi was released by GRAB, one of the largest shared taxi operator in South-east Asia. There are two data sets been released and one of them is for Singapore.
1.2 Objectives
Geospatial analytics hold tremendous potential to address complex problems facing society.
In this study, we will be working on the following tasks:
Apply appropriate spatial point patterns analysis methods
Discover the geographical and spatial-temporal distribution of Grab hailing services locations in Singapore
1.3 Getting Started
In this take-home exercise, we will be using the following packages:
| Package Name | Description |
|---|---|
| arrow | arrow for reading in parquet files |
| gifski | gifski for rendering GIF animations |
| lubridate | lubridate for working with datetimes |
| maptools | maptools for manipulating geospatial data Note: Package |
| tidyverse | tidyverse for performing data science tasks such as importing, wrangling and visualising data |
| tmap | tmap for creating thematic maps |
| raster | raster for writing, manipulating, analyzing and modeling gridded spatial data |
| sf | sf for handling geospatial data |
| sp | sp for managing SpatialPointsDataFrame, SpatialLinesDataFrame and performing projection transformation |
| spatstat | spatstat for performing point pattern analysis and deriving the kernel density estimation (KDE) layer |
| spNetwork | spNetwork for performing spatial analysis on network |
Let’s load in the packages using p_load() of pacman package.
2.0 Data Acquisition
We will be using 3 data sets in this exercise:
| Data | Format | Description | Source |
|---|---|---|---|
| Grab Posisi | Parquet | Grab taxi location points either by origins or destinations | Grab |
| Road | Shapefile | Road layer within Singapore excluding outer islands | Geofabrik download server |
| Master Plan 2019 Subzone Boundary (No Sea) | GeoJSON | Singapore boundary layer excluding outer islands | Data.gov.sg |
2.1 Extracting Geospatial and Aspatial Data Sets
Start by creating a new folder labeled Take-home_Ex01. Within this folder, create a sub-folder named data. Inside the data sub-folder, create two additional sub-folders and rename them geospatial and aspatial respectively.
Unzip the malaysia-singapore-brunei-latest-free.shp.zip and MPSZ-2019.zip folders and place all files into geospatial sub-folder.
Place all files from GrabPosisi into aspatial sub-folder.
Did you observe that the file names from GrabPosisi are quite lengthy? Shortening them could make processing more convenient later on.
Alternatively, we can list.files() to get a list of filenames that contains .parquet extension.
3.0 Geospatial Data Wrangling
3.1 Data Aggregation: Importing and Combining Aspatial parquet files
# Use list.files to get a list of filenames that match the pattern
parquet_files <- list.files(path = "data/aspatial", pattern = "\\.parquet$", full.names = TRUE)
grab_data <- data.frame()
for (file_path in parquet_files) {
grab_data <- bind_rows(grab_data, read_parquet(file_path))
}3.2 Data Export: Writing DataFrame to RDS file
write_rds(grab_data, "data/rds/grab_data.rds")3.3 Data Import
3.3.1 Importing Aspatial Data - Grab Posisi data in RDS format
grab_df <- read_rds("data/rds/grab_data.rds")str(grab_df)'data.frame': 30329685 obs. of 9 variables:
$ trj_id : chr "70014" "73573" "75567" "1410" ...
$ driving_mode : chr "car" "car" "car" "car" ...
$ osname : chr "android" "android" "android" "android" ...
$ pingtimestamp: int 1554943236 1555582623 1555141026 1555731693 1555584497 1555395258 1554768955 1554783532 1554898418 1555593189 ...
$ rawlat : num 1.34 1.32 1.33 1.26 1.28 ...
$ rawlng : num 104 104 104 104 104 ...
$ speed : num 18.9 17.7 14 13 14.8 ...
$ bearing : int 248 44 34 181 93 73 82 321 324 31 ...
$ accuracy : num 3.9 4 3.9 4 3.9 ...
What we can observe is that the grab data contains 30329685 observations and 9 variables. Notice that pingtimestamp is in the wrong format. It should be in date/time format and not integer. We will need to convert the data type in the next section (Data Preparation).
3.3.2 Importing Geospatial Data - Road data in shapefile format
We can import geospatial data into RStudio using st_read() of sf package. Let’s try it now!
roads_sf <- st_read(dsn="data/geospatial",
layer="gis_osm_roads_free_1")Reading layer `gis_osm_roads_free_1' from data source
`C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 1759836 features and 10 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS: WGS 84
str(roads_sf)Classes 'sf' and 'data.frame': 1759836 obs. of 11 variables:
$ osm_id : chr "4386520" "4578273" "4579495" "4579533" ...
$ code : int 5113 5114 5122 5122 5122 5122 5141 5122 5122 5122 ...
$ fclass : chr "primary" "secondary" "residential" "residential" ...
$ name : chr "Orchard Road" "Jalan Bukit Bintang" "Jalan Nagasari" "Persiaran Raja Chulan" ...
$ ref : chr NA NA NA NA ...
$ oneway : chr "F" "F" "B" "B" ...
$ maxspeed: int 50 0 0 0 0 0 0 0 0 0 ...
$ layer : num 0 0 0 0 0 0 -1 0 0 0 ...
$ bridge : chr "F" "F" "F" "F" ...
$ tunnel : chr "F" "F" "F" "F" ...
$ geometry:sfc_LINESTRING of length 1759836; first list element: 'XY' num [1:2, 1:2] 103.83 103.83 1.31 1.31
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
..- attr(*, "names")= chr [1:10] "osm_id" "code" "fclass" "name" ...
3.3.3 Importing Geospatial Data - Master Plan 2019 Subzone Boundary (No Sea) in shapefile format
mpsz_sf <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019")Reading layer `MPSZ-2019' from data source
`C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
str(mpsz_sf)Classes 'sf' and 'data.frame': 332 obs. of 7 variables:
$ SUBZONE_N : chr "MARINA EAST" "INSTITUTION HILL" "ROBERTSON QUAY" "JURONG ISLAND AND BUKOM" ...
$ SUBZONE_C : chr "MESZ01" "RVSZ05" "SRSZ01" "WISZ01" ...
$ PLN_AREA_N: chr "MARINA EAST" "RIVER VALLEY" "SINGAPORE RIVER" "WESTERN ISLANDS" ...
$ PLN_AREA_C: chr "ME" "RV" "SR" "WI" ...
$ REGION_N : chr "CENTRAL REGION" "CENTRAL REGION" "CENTRAL REGION" "WEST REGION" ...
$ REGION_C : chr "CR" "CR" "CR" "WR" ...
$ geometry :sfc_MULTIPOLYGON of length 332; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:300, 1:2] 104 104 104 104 104 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
..- attr(*, "names")= chr [1:6] "SUBZONE_N" "SUBZONE_C" "PLN_AREA_N" "PLN_AREA_C" ...
We can observe that both roads_sf and mpsz_sf are currently using the WGS 84 geographic coordinate system.
After looking at the file structure and contents of the 3 datasets, we need to perform the following preparations:
grab_dfConverting data type
Extracting trip starting location
Converting aspatial data into geospatial data
Getting grab data points within Downtown Core
roads_sfandmpsz_sfPerforming projection transformations
Extracting study area
Creating owin object
Getting road layer within Singapore excluding outer islands
3.4 Data Preparation
grab_df
3.4.1 Preparing and Converting grab_df into grab_sf (sf tibble data.framedata)
# Converting data type using as_datetime()
grab_df$pingtimestamp <- as_datetime(grab_df$pingtimestamp)
# Checking first n rows of data frame using head()
head(grab_df)
grab_origins_sf <- grab_df %>%
group_by(trj_id) %>%
arrange(pingtimestamp) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
start_hr = hour(pingtimestamp),
day = mday(pingtimestamp),
date_col = date(pingtimestamp)) %>%
st_as_sf(coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)
# Getting geometry details using st_geometry()
st_geometry(grab_origins_sf)- 1
-
Converting data type for
pingtimestampfrom integer into datetime using as_datetime() from lubridate package - 2
- By default, the head() returns the first 6 rows
- 3
-
Converting
grab_dfinto sf tibble data.frame using st_as_sf() and its location information - 4
- Transforming coordinates using st_transform()
trj_id driving_mode osname pingtimestamp rawlat rawlng speed
1 70014 car android 2019-04-11 00:40:36 1.342326 103.8890 18.91000
2 73573 car android 2019-04-18 10:17:03 1.321781 103.8564 17.71908
3 75567 car android 2019-04-13 07:37:06 1.327088 103.8613 14.02155
4 1410 car android 2019-04-20 03:41:33 1.262482 103.8238 13.02652
5 4354 car android 2019-04-18 10:48:17 1.283799 103.8072 14.81294
6 32630 car android 2019-04-16 06:14:18 1.300330 103.9062 23.23818
bearing accuracy
1 248 3.9
2 44 4.0
3 34 3.9
4 181 4.0
5 93 3.9
6 73 3.9
Geometry set for 28000 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3628.243 ymin: 25198.14 xmax: 49845.23 ymax: 49689.64
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
3.4.2 Creating ppp objects
grab_origins_ppp <- grab_origins_sf %>%
as('Spatial') %>%
as('SpatialPoints') %>%
as('ppp')
summary(grab_origins_ppp)Planar point pattern: 28000 points
Average intensity 2.473666e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
(46220 x 24490 units)
Window area = 1131920000 square units
The presence of duplicates can affect the spatial point pattern analysis. Let us check if our grab_origins_ppp contains any duplicated points.
# Checking for duplicated points
any(duplicated(grab_origins_ppp))[1] FALSE
plot(grab_origins_ppp)
3.4.3 Performing projection transformation
To perform projection transformation, we will use st_transform(). Additionally, we will also use st_geometry() to inspect the contents of roads_sf and mpsz_sf after the projection transformation.
roads_sf
# Transforming coordinates using st_transform()
roads_sf <- st_transform(roads_sf,
crs = 3414)
# Getting geometry details using st_geometry()
st_geometry(roads_sf)Geometry set for 1759836 features
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -434085.6 ymin: -23036.54 xmax: 1759022 ymax: 741993.8
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
mpsz_sf
# Transforming coordinates using st_transform()
mpsz_sf <- st_transform(mpsz_sf,
crs = 3414)
# Getting geometry details using st_geometry()
st_geometry(mpsz_sf)Geometry set for 332 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
Now, we observe that both roads_sf and mpsz_sf are using the SVY21 projected coordinates system.
3.4.4 Extracting specific planning area - Downtown Core
Given the high computational demands associated with analyzing the entire region of Singapore, our focus will be specifically on the Downtown Core planning area. This targeted approach aims to explore spatial data points within the Central Business District areas.
# extract Downtown core planning area
sg_downtown_sf <- mpsz_sf %>%
filter(PLN_AREA_N == "DOWNTOWN CORE")
# plot the mpsz with downtown core
tmap_mode("plot")
tm_shape(mpsz_sf) +
tm_polygons() +
tmap_options(check.and.fix = TRUE) +
tm_shape(sg_downtown_sf) +
tm_fill(col = "pink") +
tm_layout(main.title = "Map of Singapore by planning subzone",
main.title.position = "center",
main.title.size = 1.2) +
tm_view(set.zoom.limits = c(11, 16))
And here’s how the zoom in version of Downtown core planning area would look like
plot(sg_downtown_sf["SUBZONE_N"], main = "DOWNTOWN CORE", col = "pink")
3.4.5 Creating owin object - Downtown Core
We are creating an owin object to confine the analysis to Downtown Core planning area. After creating this object, we use the summary() to get a quick overview of its key features.
downtown_owin <- as.owin(sg_downtown_sf)
summary(downtown_owin)Window: polygonal boundary
single connected closed polygon with 637 vertices
enclosing rectangle: [28896.26, 31980.09] x [27914.19, 31855.48] units
(3084 x 3941 units)
Window area = 5083640 square units
Fraction of frame area: 0.418
plot(downtown_owin)
3.4.6 Combining grab data points with study area - Downtown Core
grab_origins_dt_ppp <- grab_origins_ppp[downtown_owin]
plot(grab_origins_dt_ppp, main="DOWNTOWN CORE")
3.4.7 Getting grab data points within Downtown Core
To obtain grab data points within the Downtown Core, we can apply st_intersection() from the sf package to ensure exclusion of other planning areas and outer islands in the analysis.
sg_downtown_grab_sf <- st_intersection(grab_origins_sf, sg_downtown_sf)3.4.8 Getting road layer within Singapore excluding outer islands
Similarly, to get the road layer within Downtown Core (excluding other planning areas and outer islands), st_intersection() from the sf package will be used.
sg_downtown_road_sf <- st_intersection(roads_sf, sg_downtown_sf)
sg_downtown_road_sf <- st_cast(sg_downtown_road_sf, "LINESTRING")4.0 1st Order Spatial Point Pattern Analysis (SPPA)
Now, we will be performing 1st order spatial point pattern analysis (SPPA) using spatstat package.
4.1 Kernel Density Estimation (KDE)
We will employ Kernel Density Estimation (KDE) to visualize the spatial distribution patterns of the initial trajectory locations of Grab hailing services (Grab hailing service pickups). This analysis aims to identify regions exhibiting a high concentration of these starting trajectory locations. KDE is a statistical method that allows us to create a smooth representation of the data, helping us uncover areas with dense clusters of Grab hailing service pickups.
We will calculate the KDE using the bw.diggle bandwidth parameter and the default smoothing kernel - gaussian. As the default unit of measurement for SVY 21 is in meters, the resulting density values will be expressed in terms of points per square meter. To facilitate a more comprehensible interpretation, we will employ the rescale() to convert the measurement unit from meters to kilometers. This conversion ensures that the density values represent the number of points per square kilometer, providing a more intuitive understanding of the spatial patterns in the data.
par(mfrow=c(2,2))
# KDE with default values
kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")
# Rescalling KDE values
grab_origins_dt_ppp.km <- rescale(grab_origins_dt_ppp, 1000, "km")
# KDE with rescaled values
kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")
Upon comparing the two plots above, the differences in the legends become apparent. It is important to highlight that the resulting plots look the same, with the only differences being reflected in the data values specified in the legends.
4.1.1 Working with different automatic bandwidth methods
par(mfrow=c(1,4))
# Choosing automatic bandwidth
plot(density(grab_origins_dt_ppp.km,
sigma=bw.CvL,
edge=TRUE,
kernel="gaussian"),
main="Using bw.Cv")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.scott,
edge=TRUE,
kernel="gaussian"),
main="Using bw.scott")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Using bw.diggle")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Using bw.ppl")
4.1.1 Working with different kernels
par(mfrow=c(1,4))
# Choosing kernels
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Using gaussian")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="epanechnikov"),
main="Using epanechnikov")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="quartic"),
main="Using quartic")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="disc"),
main="Using Disc")
Since we did not observe any significant differences across the kernels, we will use the default Gaussian kernel. Thus, we will be using Gaussian kernel with bw.ppl() as our final KDE plot.
kde_grab_origins_dt_bw_ppl <- density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
plot(kde_grab_origins_dt_bw_ppl)
4.1.2 Display KDE Maps on OpenStreetMap
4.1.2.1 Convert to raster for tmap display
gridded_kde_grab_origins_dt_bw_ppl <- as.SpatialGridDataFrame.im(kde_grab_origins_dt_bw_ppl)
raster_kde_grab_origins_dt_bw_ppl <- raster(gridded_kde_grab_origins_dt_bw_ppl)spplot(gridded_kde_grab_origins_dt_bw_ppl)
Let us take a look at the properties of raster_kde_grab_origins_dt_bw_ppl RasterLayer.
raster_kde_grab_origins_dt_bw_pplclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.02409239, 0.03079135 (x, y)
extent : 28.89626, 31.98009, 27.91419, 31.85548 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -2.057805e-13, 2788.346 (min, max)
Notice that the crs property is NA.
4.1.2.2 Assigning projection systems
projection(raster_kde_grab_origins_dt_bw_ppl) <- CRS("+init=EPSG:3414 +units=km")
raster_kde_grab_origins_dt_bw_pplclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.02409239, 0.03079135 (x, y)
extent : 28.89626, 31.98009, 27.91419, 31.85548 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=km +no_defs
source : memory
names : v
values : -2.057805e-13, 2788.346 (min, max)
Notice that the crs property is completed.
4.1.2.3 Display on tmap OpenStreetMap
tmap_mode('view')
tm_basemap('OpenStreetMap') +
tm_shape(raster_kde_grab_origins_dt_bw_ppl) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
tm_shape(sg_downtown_sf) +
tm_borders() +
tm_text("SUBZONE_N", size = 0.65)tmap_mode('plot')Interpretations of Spatial Analysis from KDE map🔎:
The specific subzones that have relatively higher concentration are
Cecil,Raffles PlaceandBugis. They are denoted by the darker green colorSubzones like
PhilipandMaxwellhave relatively lower concentration. They are denoted by lighter yellow colorFrom KDE map, we can only infer if a particular area / subzone area is highly or lessly concentrated but not specific road segments
Despite experimenting with various Kernel Density Estimation (KDE) methods and parameters, the interpretability of the generated plots remain constrained by a fundamental limitation of KDE, that is, the inability to comprehensively account for the intricate structure of road networks. The complexity of urban road layouts is not effectively captured by traditional KDE approaches, leading to a less precise representation of spatial patterns. As a consequence, the visualizations may not offer a nuanced understanding of the spatial dynamics, particularly in urban environments with intricate road networks.
4.2 Network Kernel Density Estimation (NKDE)
As a response to the limitation of KDE, we turn to NKDE. Network KDE considers the constraints and pathways defined by the road network, addressing the shortcomings of KDE by offering a more nuanced and accurate depiction of the concentrated areas for Grab hailing service pickups within urban landscapes.
4.2.1 Visualizing the Geospatial data
Let us visualize the geospatial data first.
tmap_mode('view')
tm_shape(sg_downtown_sf) +
tm_fill(col = "SUBZONE_N", alpha = 0.5) +
tm_shape(sg_downtown_road_sf) +
tm_lines() +
tm_shape(sg_downtown_grab_sf) +
tm_dots()